fix: properly join url and path when calculating full url by lmossman · Pull Request #692 · airbytehq/airbyte-python-cdk

Lake Mossman (lmossman) · 2025-08-04T17:37:05Z

What

A user raised a bug with the current logic for handling Request Path cursor page token injection: https://airbyte1416.zendesk.com/agent/tickets/13510

Basically, if the "next page token" contains a full URL which contains a different domain than the url field of the stream's HttpRequester, we currently just naively concatenate the two together, which usually results in a failure.

Here is an example from the zendesk ticket above:

Currently, their API Endpoint URL for that stream is https://globalus251.dayforcehcm.com/api/parachute/v1/Reports/job_posting
But the Cursor Pagination returns a "next page" URL that looks like this: https://globalus251.dayforcehcm.com:443/api/parachute/v1/Reports/job_posting?25c92c37-2ba7-468c-89f9-834f6c050ddc=2024-01-01T00%3a00%3a00.000000&cursor=UhsMQmz0rZtvj7oCJ1e7DLe2O7v5nb5Gco1YD9NCTtU%253D
Notice that that next page URL has :443 after the .com, but the API Endpoint URL doesn't
This results in the second page request being sent to https://globalus251.dayforcehcm.com/api/parachute/v1/Reports/job_postinghttps://globalus251.dayforcehcm.com:443/api/parachute/v1/Reports/job_posting?25c92c37-2ba7-468c-89f9-834f6c050ddc=2024-01-01T00%3A00%3A00.000000&cursor=UdaBrnaiXuKw%252BFyD6FzERZba7d%252FPA0HuIoON3QbfZwI%253D&pageSize=5 (which is just the concatenation of the API Endpoint URL and the next page token)

How

To fix this, I simply modify the logic to call _join_url() on the url and path instead of naively concatenating them together.

This fixes the issue, because _join_url() will prefer the path if it contains its own full http scheme and domain, which is what we want in this case.

This also has a side-benefit of correctly handling the case where the url does not have a trailing / and the path does not have a leading / - the old implementation would not insert a / between these, whereas the new implementation does.

Testing

I have added unit tests to validate this fix, and you can reproduce the situation with this manifest: https://gist.github.com/lmossman/404b656c3e5726ddebc026eae118b7f8

Summary by CodeRabbit

Bug Fixes
- Improved URL joining logic to ensure consistent and reliable construction of request URLs in all scenarios.
Tests
- Added new tests to verify correct URL formation when using different combinations of base URLs and paths.

github-actions · 2025-08-04T17:37:16Z

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@lmossman/fix-url-path-joining#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch lmossman/fix-url-path-joining

Helpful Resources

CDK API Reference

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

/autofix - Fixes most formatting and linting issues
/poetry-lock - Updates poetry.lock file
/test - Runs connector tests with the updated CDK
/poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

Copilot

Pull Request Overview

This PR fixes a bug in URL construction where full URLs from cursor pagination tokens were incorrectly concatenated with base URLs, causing request failures. The fix ensures proper URL joining by using the existing _join_url() method instead of naive string concatenation.

Replaces string concatenation with _join_url() method call in the _get_url() method
Adds comprehensive unit tests to validate the URL joining behavior with various scenarios

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
airbyte_cdk/sources/declarative/requesters/http_requester.py	Updates URL construction logic to use `_join_url()` method for proper URL handling
unit_tests/sources/declarative/requesters/test_http_requester.py	Adds test cases to validate URL joining behavior with different path scenarios

airbyte_cdk/sources/declarative/requesters/http_requester.py

github-actions · 2025-08-04T17:44:38Z

PyTest Results (Fast)

3 699 tests +4 3 688 ✅ +4 6m 37s ⏱️ +12s
1 suites ±0 11 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 5d2d4f2. ± Comparison against base commit 6c0d36d.

coderabbitai · 2025-08-04T17:46:27Z

📝 Walkthrough

Walkthrough

The _get_url method in HttpRequester was updated to consistently use the _join_url method for combining a base URL and a path, regardless of whether the base URL comes from url_base or url. A new parameterized test was added to verify this behavior when using url instead of url_base.

Changes

Cohort / File(s)	Change Summary
HttpRequester URL Construction Logic `airbyte_cdk/sources/declarative/requesters/http_requester.py`	Modified `_get_url` to always use `_join_url` for joining `url` and `path` when `url_base` is not present.
Unit Tests for URL Joining `unit_tests/sources/declarative/requesters/test_http_requester.py`	Added a parameterized test to verify correct URL joining when initialized with `url` instead of `url_base`.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Case
    participant HttpRequester as HttpRequester
    participant HttpClient as HTTP Client

    Test->>HttpRequester: Initialize with url and path
    Test->>HttpRequester: Make request
    HttpRequester->>HttpRequester: _get_url (uses _join_url)
    HttpRequester->>HttpClient: Send request with joined URL
    HttpClient-->>Test: Return prepared request (assert URL)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~7 minutes

Possibly related PRs

fix: (CDK) (HttpRequester) - fix trailing slash for url_base when no path has been provided #412: Both PRs modify URL joining logic in HttpRequester, focusing on consistent behavior and edge cases. Wdyt about reviewing these together for context?

Suggested labels

airbyte-python-cdk, airbyte-python-cdk/low-code/http-requester

Suggested reviewers

chandlerprall
artem1205

Would you like to consider looping in anyone else for this review, or does this cover the main stakeholders? Wdyt?

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c0d36d and 5d2d4f2.

📒 Files selected for processing (2)

airbyte_cdk/sources/declarative/requesters/http_requester.py (1 hunks)
unit_tests/sources/declarative/requesters/test_http_requester.py (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Check: source-shopify
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)

🔇 Additional comments (2)

airbyte_cdk/sources/declarative/requesters/http_requester.py (1)

171-177: LGTM! This change properly addresses the URL concatenation issue described in the PR.

The modification to consistently use _join_url() for both url_base + path and url + path scenarios is a solid improvement. This ensures that when pagination tokens contain full URLs, they're handled correctly rather than being naively concatenated. The existing _join_url implementation already handles the edge cases well, including full URLs in the path parameter.

unit_tests/sources/declarative/requesters/test_http_requester.py (1)

864-906: Excellent test coverage for the URL joining fix! 🎯

This parameterized test does a great job covering the key scenarios that the PR aims to fix, especially the cases where the path contains a full URL (test cases 3 & 4). The test structure mirrors the existing test_join_url nicely and provides comprehensive coverage for the new code path when using url instead of url_base.

The scenarios you've chosen directly address the pagination URL concatenation issue mentioned in the PR objectives, wdyt?

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch lmossman/fix-url-path-joining

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

github-actions · 2025-08-04T17:49:43Z

PyTest Results (Full)

3 702 tests +4 3 691 ✅ +4 11m 40s ⏱️ ±0s
1 suites ±0 11 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 5d2d4f2. ± Comparison against base commit 6c0d36d.

Christo Grabowski (ChristoGrab)

LGTM!

…)" This reverts commit 950acc6.

properly join url and path when calculating full url

5d2d4f2

github-actions bot added bug Something isn't working security labels Aug 4, 2025

Lake Mossman (lmossman) marked this pull request as ready for review August 4, 2025 17:44

Lake Mossman (lmossman) requested review from Christo Grabowski (ChristoGrab) and Copilot August 4, 2025 17:44

Copilot AI reviewed Aug 4, 2025

View reviewed changes

airbyte_cdk/sources/declarative/requesters/http_requester.py Show resolved Hide resolved

coderabbitai bot approved these changes Aug 4, 2025

View reviewed changes

Christo Grabowski (ChristoGrab) approved these changes Aug 4, 2025

View reviewed changes

Lake Mossman (lmossman) merged commit 950acc6 into main Aug 4, 2025
29 of 30 checks passed

Lake Mossman (lmossman) deleted the lmossman/fix-url-path-joining branch August 4, 2025 20:05

Lake Mossman (lmossman) added a commit that referenced this pull request Aug 5, 2025

Revert "fix: properly join url and path when calculating full url (#692…

34a0130

…)" This reverts commit 950acc6.

This was referenced Aug 5, 2025

fix: revert builder concurrency and pagination url changes #694

Merged

fix: url and path joining for cursor paginators #695

Merged

Lake Mossman (lmossman) added a commit that referenced this pull request Aug 5, 2025

Revert "fix: properly join url and path when calculating full url (#692…

513fef1

…)" This reverts commit 950acc6.

Comments

Conversation

Lake Mossman (lmossman) commented Aug 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Testing

Summary by CodeRabbit

Uh oh!

github-actions bot commented Aug 4, 2025

👋 Greetings, Airbyte Team Member!

Testing This CDK Version

Helpful Resources

PR Slash Commands

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

github-actions bot commented Aug 4, 2025

PyTest Results (Fast)

Uh oh!

coderabbitai bot commented Aug 4, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

github-actions bot commented Aug 4, 2025

PyTest Results (Full)

Uh oh!

Christo Grabowski (ChristoGrab) left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Lake Mossman (lmossman) commented Aug 4, 2025 •

edited by coderabbitai bot

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)